リボソーム複合体の一部であるリボソームタンパク質(s).
huang@MuyangnoMBP ~ % uname -a
Darwin MuyangnoMBP 19.4.0 Darwin Kernel Version 19.4.0: Wed Mar 4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64 x86_64
huang@MuyangnoMBP ~ % sw_vers
ProductName: Mac OS X
ProductVersion: 10.15.4
BuildVersion: 19E287
R version 4.0.0 (2020-04-24) -- "Arbor Day"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)
[R.app GUI 1.71 (7827) x86_64-apple-darwin17.0]
R.app GUI 1.71 (7827 Catalina build), S. Urbanek & H.-J. Bibiko, © R Foundation for Statistical Computing, 2016
RStudio 1.2.5042, © 2009-2020 RStudio, Inc.
Chimera-1.14-mac64, Nov. 13, 2019, © 2019 Regents of the University of California. All Rights Reserved.
RパッケージSeqinRを用いて、アミノ酸配列データを取得する.
seqinr等のパッケージの呼び出し:
# load the R package.
library(seqinr)
library(Biostrings)
library(msa)
library(ape)
Answer the following questions. For each question, please record your answer, and what you typed into R to get this answer.
Q1. Calculate the genetic distances between > 3 protein sequences of interest. Which are the most closely related proteins, based on the genetic distances?
write out the sequences to a FASTA file
write.fasta(seqs_S7, seqnames_S7, file="myseq_Assignment_S7.fasta")
Read an XStringSet object from a file
mySequences_S7 <- readAAStringSet(file = "myseq_Assignment_S7.fasta")
Multiple Sequence Alignment using ClustalW
myAlignment_S7 <- msa(mySequences_S7, "ClustalW")
## use default substitution matrix
print(myAlignment_S7, show="complete")
##
## MsaAAMultipleAlignment with 6 rows and 239 columns
## aln (1..54) names
## [1] ----------------------------------MPRKGPVAKRDVLPDPI--- P21469
## [2] ----------------------------------MPRRGPVAKRDVLPDPI--- P22744
## [3] ----------------------------------MPRRRVIGQRKILPDPK--- P02359
## [4] ----------------------------------MARRRRAEVRQLQPDLV--- P17291
## [5] MSAEDTPEADADAAEESEPETARAKLFGEWDITDIEYSDPSTERYITVTPI--A P32552
## [6] ------------------MELDEIKVFGRWSTKDVVVKDPGLRNYINLTPIYVP P54063
## Con ----------------------------------MPRR?P???R?ILPDPI--- Consensus
##
## aln (55..108) names
## [1] -------------YNSKLVSRLINKMMI----DGKKGKSQTILYKSFDIIKERT P21469
## [2] -------------YNSKLVTRLINKIMI----DGKKSKAQKILYTAFDIIRERT P22744
## [3] -------------FGSELLAKFVNILMV----DGKKSTAESIVYSALETLAQRS P02359
## [4] -------------YGDVLVTAFINKIMR----DGKKNLAARIFYDACKIIQEKT P17291
## [5] HTMGRHADKQFKKSEISIVERLINRLMQTDENTGKKQLATSIVTEAFELVHERT P32552
## [6] HTAGRYTKRQFEKAKMNIVERLVNKVMRREENTGKKLKALKIVENAFEIIEKRT P54063
## Con -------------Y?S?LV?RLINK?M?----DGKK?KA??IVY?AFEII?ERT Consensus
##
## aln (109..162) names
## [1] GNDAMEVFEQALKNIMPVLEVKARRVGGANYQVPVEVRPERRTTLGLRWLVN-- P21469
## [2] GKDPMEVFEQALKNVMPVLEVRARRVGGANYQVPVEVRPDRRVSLGLRWLVQ-- P22744
## [3] GKSELEAFEVALENVRPTVEVKSRRVGGSTYQVPVEVRPVRRNALAMRWIVE-- P02359
## [4] GQEPLKVFKQAVENVKPRMEVRSRRVGGANYQVPMEVSPRRQQSLALRWLVQ-- P17291
## [5] DENPIQVLVSAVENSAPREETVRLKYGGISVPKAVDVAPQRRVDQALKFLAEGV P32552
## [6] KQNPIQVLVDAIENAGPREDTTRISYGGIVYLQSVDCSSLRRIDVALRNIALGA P54063
## Con G??P?EVFEQALENV?PR?EV??RRVGGANYQVPVEVRP?RR??LALRWLV?-- Consensus
##
## aln (163..216) names
## [1] -YARLRGEKTMEERLANEILDAAN---NTGAAVKKREDTHKMAEANKAFAHYRW P21469
## [2] -YARLRNEKTMEERLANEIMDAAN---NTGAAVKKREDTHKMAEANKAFAHYRW P22744
## [3] -AARKRGDKSMALRLANELSDAAE---NKGTAVKKREDVHRMAEANKAFAHYRW P02359
## [4] -AANQRPERRAAVRIAHELMDAAE---GKGGAVKKKEDVERMAEANRAYAHYRW P17291
## [5] YGGSFKTTTTAAEALAQQLIGAANDDVQT-YAVNQKEEKERVAAAAR------- P32552
## [6] YMAAHKSKKPIEEALAEEIIAAARGDMQKSYAVRKKEETERVAQSAR------- P54063
## Con -?AR?R?EKTM?ERLANE??DAAN---N?G?AVKK?EDT?RMAEAN?AFAHYRW Consensus
##
## aln (217..239) names
## [1] ----------------------- P21469
## [2] ----------------------- P22744
## [3] LSLRSFSHQAGASSKQPALGYLN P02359
## [4] ----------------------- P17291
## [5] ----------------------- P32552
## [6] ----------------------- P54063
## Con ----------------------- Consensus
Chimeraを用いて可視化したS7のConsensus配列.
write an XStringSet object to a file
writeXStringSet(unmasked(myAlignment_S7), file = "myaln_Assignment_S7.fasta")
read the FASTA-format alignment into R
myaln_S7 <- read.alignment(file = "myaln_Assignment_S7.fasta", format = "fasta")
calculate the genetic distances between the protein sequences
mydist_S7 <- dist.alignment(myaln_S7)
mydist_S7
## P21469 P22744 P02359 P17291 P32552
## P22744 0.1961161
## P02359 0.4160251 0.4082483
## P17291 0.4311582 0.4082483 0.4236593
## P32552 0.5452498 0.5067117 0.5870218 0.5263336
## P54063 0.5616371 0.5309230 0.6075587 0.5792844 0.4436311
get sequence annotations
unlist(getAnnot(seqs_S7))
## [1] "sp|P02359|RS7_ECOLI 30S ribosomal protein S7 OS=Escherichia coli (strain K12) OX=83333 GN=rpsG PE=1 SV=3"
## [2] "sp|P21469|RS7_BACSU 30S ribosomal protein S7 OS=Bacillus subtilis (strain 168) OX=224308 GN=rpsG PE=1 SV=4"
## [3] "sp|P17291|RS7_THET8 30S ribosomal protein S7 OS=Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579) OX=300852 GN=rpsG PE=1 SV=3"
## [4] "sp|P22744|RS7_GEOSE 30S ribosomal protein S7 OS=Geobacillus stearothermophilus OX=1422 GN=rpsG PE=1 SV=3"
## [5] "sp|P32552|RS7_HALMA 30S ribosomal protein S7 OS=Haloarcula marismortui (strain ATCC 43049 / DSM 3752 / JCM 8966 / VKM B-1809) OX=272569 GN=rps7 PE=1 SV=2"
## [6] "sp|P54063|RS7_METJA 30S ribosomal protein S7 OS=Methanocaldococcus jannaschii (strain ATCC 43067 / DSM 2661 / JAL-1 / JCM 10045 / NBRC 100440) OX=243232 GN=rps7 PE=3 SV=1"
Bacillus subtilis(P21469) and Geobacillus stearothermophilus(P22744) are the most closely related proteins, based on the genetic distances.
Q2. Build an unrooted phylogenetic tree of the proteins, using the neighbour-joining algorithm. Which are the most closely related proteins, based on the tree?
# construct a phylogenetic tree with the neighbor joining algorithm
mytree_S7 <- nj(mydist_S7)
plot.phylo(mytree_S7, type="unrooted")
Bacillus subtilis(P21469) and Geobacillus stearothermophilus(P22744) are the most closely related proteins, based on the tree.
Q3. Build a rooted phylogenetic tree of the proteins, using an outgroup. Which are the most closely related proteins, based on the tree? What extra information does this tree tell you, compared to the unrooted tree in Q2?
mytree_S7 <- root(mytree_S7, outgroup = "P54063", resolve.root = TRUE)
plot.phylo(mytree_S7, main = "Phylogenetic Tree")
Bacillus subtilis(P21469) and Geobacillus stearothermophilus(P22744) are the most closely related proteins, based on the tree. Escherichia coli(P02359) is more closely related to Bacillus subtilis(P21469) and Geobacillus stearothermophilus(P22744) rather than Thermus thermophilus(P17291).
Answer the following questions. For each question, please record your answer, and what you typed into R to get this answer.
Q1. Calculate the genetic distances between > 3 protein sequences of interest. Which are the most closely related proteins, based on the genetic distances?
write out the sequences to a FASTA file
write.fasta(seqs_L2, seqnames_L2, file="myseq_Assignment_L2.fasta")
Read an XStringSet object from a file
mySequences_L2 <- readAAStringSet(file = "myseq_Assignment_L2.fasta")
Multiple Sequence Alignment using ClustalW
myAlignment_L2 <- msa(mySequences_L2, "ClustalW")
## use default substitution matrix
print(myAlignment_L2, show="complete")
##
## MsaAAMultipleAlignment with 6 rows and 283 columns
## aln (1..54) names
## [1] MAIKKYKPTSNGRRGMTTSDFAEITTDKPEKSLLAPLHKKGGRNNQGKLTVRHQ P42919
## [2] MAIKKYKPTSNGRRGMTVLDFSEITTDQPEKSLLAPLKKRAGRNNQGKITVRHQ P04257
## [3] MAVKKFKPYTPSRRFMTVADFSEITKTEPEKSLVKPLKKTGGRNNQGRITVRFR P60405
## [4] MAVVKCKPTSPGRRHVVKVVNPELHKGKPFAPLLEKNSKSGGRNNNGRITTRHI P60422
## [5] ----------MGRR-------------------IQGQRRGRGTSTFRAPSHRYK P20276
## [6] ----------MGKR-------------------LISQRRGRGSSVYTCPSHKRR P54017
## Con MA?KK?KPTS?GRR?MT??DF?EIT???PEKSLL?PL?K?GGRNNQG?ITVRH? Consensus
##
## aln (55..108) names
## [1] GGGHKRQYRVIDFKR-DKDGIPGRVATVEYDPNRSANIALINYADGEKRYILAP P42919
## [2] GGGHKRQYRIIDFKR-DKDGIPGRVATIEYDPNRSANIALINYADGEKRYIIAP P04257
## [3] GGGHKRLYRIIDFKRWDKVGIPAKVAAIEYDPNRSARIALLHYVDGEKRYIIAP P60405
## [4] GGGHKQAYRIVDFKR-NKDGIPAVVERLEYDPNRSANIALVLYKDGERRYILAP P60422
## [5] ADLEHR---KVEDGD----VIAGTVVDIEHDPARSAPVAAVEFEDGDRRLILAP P20276
## [6] GEAKYRRFDELEKKG----KVLGKIVDILHDPGRSAPVAKVEYETGEEGLLVVP P54017
## Con GGGHKR?YRIIDFKR-DKDGIPG?VA?IEYDPNRSANIALV?Y?DGEKRYILAP Consensus
##
## aln (109..162) names
## [1] KGIQVGTEIMSGPEADIKVGNALPLINIPVGTVVHNIELKPGKGGQLVRSAGTS P42919
## [2] KNLKVGMEIMSGPDADIKIGNALPLENIPVGTLVHNIELKPGRGGQLVRAAGTS P04257
## [3] DGLQVGQQVVAGPDAPIQVGNALPLRFIPVGTVVHAVELEPKKGAKLARAAGTS P60405
## [4] KGLKAGDQIQSGVDAAIKPGNTLPMRNIPVGSTVHNVEMKPGKGGQLARSAGTY P60422
## [5] EGVGVGDELQVGVSAEIAPGNTLPLAEIPEGVPVCNVESSPGDGGKFARASGVN P20276
## [6] EGVKVGDIIECGVSAEIKPGNILPLGAIPEGIPVFNIETVPGDGGKLVRAGGCY P54017
## Con KGLKVGDEI?SG?DA?IKPGNALPL?NIPVGT?VHN?ELKPGKGG?L?RAAGTS Consensus
##
## aln (163..216) names
## [1] AQVLGKEGKYVLVRLNSGEVRMILSACRASIGQVGNEQHELINIGKAGRSRWKG P42919
## [2] AQVLGKEGKYVIVRLASGEVRMILGKCRATVGEVGNEQHELVNIGKAGRARWLG P04257
## [3] AQIQGREGDYVILRLPSGELRKVHGECYATVGAVGNADHKNIVLGKAGRSRWLG P60405
## [4] VQIVARDGAYVTLRLRSGEMRKVEADCRATLGEVGNAEHMLRVLGKAGAARWRG P60422
## [5] AQLLTHDRNVAVVKLPSGEMKRLDPQCRATIGVVAGGGRTDKPFVKAGNKHHKM P20276
## [6] AHILTHDGERTYVKLPSGHIKALHSMCRATIGVVAGGGRKEKPFVKAGKKYHAM P54017
## Con AQILG??G?YV?VRLPSGE?R?????CRATIG?VGN??H?L???GKAGR?RW?G Consensus
##
## aln (217..270) names
## [1] IR-----PTVRGSVMNPNDHPHGGGEGRAPIGRKSPMSPWGKPTLGFKTRKKKN P42919
## [2] IR-----PTVRGSVMNPVDHPHGGGEGKAPIGRKSPMTPWGKPTLGYKTRKKKN P04257
## [3] RR-----PHVRGAAMNPVDHPHGGGEGRAPRGR-PPASPWGWQTKGLKTRKRRK P60405
## [4] VR-----PTVRGTAMNPVDHPHGGGEGRN-FGK-HPVTPWGVQTKGKKTRSNKR P60422
## [5] KARGTKWPNVRGVAMNAVDHPFGGG------GRQHPGKPKSISRN-APPGRKVG P20276
## [6] KAKAVKWPRVRGVAMNAVDHPFGGG------RHQHTGKPTTVSRKKVPPGRKVG P54017
## Con ?R-----PTVRG?AMNPVDHPHGGGEGRA??GR?HP??PWG??TKG?KTRKKK? Consensus
##
## aln (271..283) names
## [1] KSDKFIVRRRKNK P42919
## [2] KSDKFIIRRRKK- P04257
## [3] PSSRFIIARRKK- P60405
## [4] -TDKFIVRRRSK- P60422
## [5] DIASKRTGRGGNE P20276
## [6] HISARRTGVRK-- P54017
## Con ?SDKFI?RRRKK- Consensus
Chimeraを用いて可視化したL2のConsensus配列.
write an XStringSet object to a file
writeXStringSet(unmasked(myAlignment_L2), file = "myaln_Assignment_L2.fasta")
read the FASTA-format alignment into R
myaln_L2 <- read.alignment(file = "myaln_Assignment_L2.fasta", format = "fasta")
calculate the genetic distances between the protein sequences
mydist_L2 <- dist.alignment(myaln_L2)
mydist_L2
## P42919 P04257 P60405 P60422 P20276
## P04257 0.2170287
## P60405 0.4045199 0.3813850
## P60422 0.4364358 0.4364358 0.4688072
## P20276 0.6084511 0.6132441 0.5859587 0.5872202
## P54017 0.6230455 0.6297813 0.6243641 0.6222813 0.4150529
get sequence annotations
unlist(getAnnot(seqs_L2))
## [1] "sp|P60422|RL2_ECOLI 50S ribosomal protein L2 OS=Escherichia coli (strain K12) OX=83333 GN=rplB PE=1 SV=2"
## [2] "sp|P42919|RL2_BACSU 50S ribosomal protein L2 OS=Bacillus subtilis (strain 168) OX=224308 GN=rplB PE=1 SV=3"
## [3] "sp|P60405|RL2_THET8 50S ribosomal protein L2 OS=Thermus thermophilus (strain HB8 / ATCC 27634 / DSM 579) OX=300852 GN=rplB PE=1 SV=3"
## [4] "sp|P04257|RL2_GEOSE 50S ribosomal protein L2 OS=Geobacillus stearothermophilus OX=1422 GN=rplB PE=1 SV=2"
## [5] "sp|P20276|RL2_HALMA 50S ribosomal protein L2 OS=Haloarcula marismortui (strain ATCC 43049 / DSM 3752 / JCM 8966 / VKM B-1809) OX=272569 GN=rpl2 PE=1 SV=4"
## [6] "sp|P54017|RL2_METJA 50S ribosomal protein L2 OS=Methanocaldococcus jannaschii (strain ATCC 43067 / DSM 2661 / JAL-1 / JCM 10045 / NBRC 100440) OX=243232 GN=rpl2 PE=3 SV=2"
Bacillus subtilis(P42919) and Geobacillus stearothermophilus(P04257) are the most closely related proteins, based on the genetic distances.
Q2. Build an unrooted phylogenetic tree of the proteins, using the neighbour-joining algorithm. Which are the most closely related proteins, based on the tree?
# construct a phylogenetic tree with the neighbor joining algorithm
mytree_L2 <- nj(mydist_L2)
plot.phylo(mytree_L2, type="unrooted")
Bacillus subtilis(P42919) and Geobacillus stearothermophilus(P04257) are the most closely related proteins, based on the tree.
Q3. Build a rooted phylogenetic tree of the proteins, using an outgroup. Which are the most closely related proteins, based on the tree? What extra information does this tree tell you, compared to the unrooted tree in Q2?
mytree_L2 <- root(mytree_L2, outgroup = "P54017", resolve.root = TRUE)
plot.phylo(mytree_L2, main = "Phylogenetic Tree")
Bacillus subtilis(P42919) and Geobacillus stearothermophilus(P04257) are the most closely related proteins, based on the tree. Thermus thermophilus(P60405) is more closely related to Bacillus subtilis(P42919) and Geobacillus stearothermophilus(P04257) rather than Escherichia coli(P60422).
Answer the following questions. For each question, please record your answer, and what you typed into R to get this answer.
Q1. Calculate the genetic distances between > 3 protein sequences of interest. Which are the most closely related proteins, based on the genetic distances?
write out the sequences to a FASTA file
write.fasta(seqs_L5, seqnames_L5, file="myseq_Assignment_L5.fasta")
Read an XStringSet object from a file
mySequences_L5 <- readAAStringSet(file = "myseq_Assignment_L5.fasta")
Multiple Sequence Alignment using ClustalW
myAlignment_L5 <- msa(mySequences_L5, "ClustalW")
## use default substitution matrix
print(myAlignment_L5, show="complete")
##
## MsaAAMultipleAlignment with 6 rows and 213 columns
## aln (1..54) names
## [1] MNR---LKEKYNKEIAPALMTKFNYDSVMQVPKIEKIVINMGVGDAVQNAKAID P12877
## [2] MNR---LKEKYVKEVVPALMSKFNYKSIMQVPKIEKIVINMGVGDAVQNPKALD P08895
## [3] MPLDVALKRKYYEEVRPELIRRFGYQNVWEVPRLEKVVINQGLGEAKEDARILE P41201
## [4] MAK---LHDYYKDEVVKKLMTEFNYNSVMQVPRVEKITLNMGVGEAIADKKLLD P62399
## [5] ---------------MSSESESGGDFHEMREPRIEKVVVHMGIGHGGRD---LA P14124
## [6] ---------------MSFEELWQK--NPMLKPRIEKVVVNFGVGESGDR---LT P54040
## Con M??---LK?KY??EV?P?LM??FNY?SVMQVPRIEK?VINMGVGEA??D?K?LD Consensus
##
## aln (55..108) names
## [1] SAVEELTFIAGQKPVVTRAKKSIAGFRLREGMPIGAKVTLRGERMYDFLDKLIS P12877
## [2] SAVEELTLIAGQRPVVTRAKKSIAGFRLRQGMPIGAKVTLRGERMYEFLDKLIS P08895
## [3] KAAQELALITGQKPAVTRAKKSISNFKLRKGMPIGLRVTLRRDRMWIFLEKLLN P41201
## [4] NAAADLAAISGQKPLITKARKSVAGFKIRQGYPIGCKVTLRGERMWEFFERLIT P62399
## [5] NAEDILGEITGQMPVRTKAKRTVGEFDIREGDPIGAKVTLRDEMAEEFLQTALP P14124
## [6] KGAQVIEELTGQKPIRTRAKQTNPSFGIRKKLPIGLKVTLRGKKAEEFLKNAFE P54040
## Con ?AA?EL??ITGQKPVVTRAKKSIAGF??R?GMPIGAKVTLRGERM?EFL?KLI? Consensus
##
## aln (109..162) names
## [1] VSLPRVRDFRGVSKKSFDGRGNYTLGIKEQLIFPEIDYDKVTKVRGMDIVIVTT P12877
## [2] VSLPRVRDFRGVSKKAFDGRGNYTLGIKEQLIFPEIDYDKVNKVRGMDIVIVTT P08895
## [3] VALPRIRDFRGLNPNSFDGRGNYNLGLREQLIFPEITYDMVDALRGMDIAVVTT P41201
## [4] IAVPRIRDFRGLSAKSFDGRGNYSMGVREQIIFPEIDYDKVDRVRGLDITITTT P62399
## [5] LA--------ELATSQFDDTGNFSFGVEEHTEFPSQEYDPSIGIYGLDVTVNLV P14124
## [6] AFQ---KEGKKLYDYSFDDYGNFSFGIHEHIDFPGQKYDPMIGIFGMDVCVTLE P54040
## Con VALPR?RDFRGLS?KSFDGRGNYSLGI?EQLIFPEIDYDKV??VRGMDI??VTT Consensus
##
## aln (163..213) names
## [1] ANTDEEARELLTQVGMPFQK------------------------------- P12877
## [2] ANTDEEARELLALLGMPFQK------------------------------- P08895
## [3] AETDEEARALLELLGFPFRK------------------------------- P41201
## [4] AKSDEEGRALLAAFDFPFRK------------------------------- P62399
## [5] RPGYRVAKRDKASRSIPTKHRLNPADAVAFIESTYDVEVSE---------- P14124
## [6] RPGFRVKRRKRCRAKIPRRHRLTREEAIEFIEKTFGVKVERVLLEEEEETQ P54040
## Con A?TDEEAR?LLA??G?PFRK------------------------------- Consensus
Chimeraを用いて可視化したL5のConsensus配列.
write an XStringSet object to a file
writeXStringSet(unmasked(myAlignment_L5), file = "myaln_Assignment_L5.fasta")
read the FASTA-format alignment into R
myaln_L5 <- read.alignment(file = "myaln_Assignment_L5.fasta", format = "fasta")
calculate the genetic distances between the protein sequences
mydist_L5 <- dist.alignment(myaln_L5)
mydist_L5
## P12877 P08895 P41201 P62399 P14124
## P08895 0.1830835
## P41201 0.3737175 0.3811186
## P62399 0.3584573 0.3584573 0.3883787
## P14124 0.5937711 0.5937711 0.5937711 0.5773503
## P54040 0.5987408 0.6039701 0.6244494 0.6039701 0.4659859
get sequence annotations
unlist(getAnnot(seqs_L5))
## [1] "sp|P62399|RL5_ECOLI 50S ribosomal protein L5 OS=Escherichia coli (strain K12) OX=83333 GN=rplE PE=1 SV=2"
## [2] "sp|P12877|RL5_BACSU 50S ribosomal protein L5 OS=Bacillus subtilis (strain 168) OX=224308 GN=rplE PE=1 SV=1"
## [3] "sp|P41201|RL5_THETH 50S ribosomal protein L5 OS=Thermus thermophilus OX=274 GN=rplE PE=1 SV=3"
## [4] "sp|P08895|RL5_GEOSE 50S ribosomal protein L5 OS=Geobacillus stearothermophilus OX=1422 GN=rplE PE=1 SV=1"
## [5] "sp|P14124|RL5_HALMA 50S ribosomal protein L5 OS=Haloarcula marismortui (strain ATCC 43049 / DSM 3752 / JCM 8966 / VKM B-1809) OX=272569 GN=rpl5 PE=1 SV=4"
## [6] "sp|P54040|RL5_METJA 50S ribosomal protein L5 OS=Methanocaldococcus jannaschii (strain ATCC 43067 / DSM 2661 / JAL-1 / JCM 10045 / NBRC 100440) OX=243232 GN=rpl5 PE=3 SV=1"
Bacillus subtilis(P12877) and Geobacillus stearothermophilus(P08895) are the most closely related proteins, based on the genetic distances.
Q2. Build an unrooted phylogenetic tree of the proteins, using the neighbour-joining algorithm. Which are the most closely related proteins, based on the tree?
# construct a phylogenetic tree with the neighbor joining algorithm
mytree_L5 <- nj(mydist_L5)
plot.phylo(mytree_L5, type="unrooted")
Bacillus subtilis(P12877) and Geobacillus stearothermophilus(P08895) are the most closely related proteins, based on the tree.
Q3. Build a rooted phylogenetic tree of the proteins, using an outgroup. Which are the most closely related proteins, based on the tree? What extra information does this tree tell you, compared to the unrooted tree in Q2?
mytree_L5 <- root(mytree_L5, outgroup = "P54040", resolve.root = TRUE)
plot.phylo(mytree_L5, main = "Phylogenetic Tree")
Bacillus subtilis(P12877) and Geobacillus stearothermophilus(P08895) are the most closely related proteins, based on the tree. Escherichia coli(P62399) is more closely related to Bacillus subtilis(P12877) and Geobacillus stearothermophilus(P08895) rather than Thermus thermophilus(P41201).
sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.4
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] ja_JP.UTF-8/ja_JP.UTF-8/ja_JP.UTF-8/C/ja_JP.UTF-8/ja_JP.UTF-8
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ape_5.4 msa_1.20.0 Biostrings_2.56.0
## [4] XVector_0.28.0 IRanges_2.22.2 S4Vectors_0.26.1
## [7] BiocGenerics_0.34.0 seqinr_3.6-1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.4.6 knitr_1.28 magrittr_1.5 zlibbioc_1.34.0
## [5] MASS_7.3-51.6 lattice_0.20-41 rlang_0.4.6 stringr_1.4.0
## [9] highr_0.8 tools_4.0.0 grid_4.0.0 nlme_3.1-148
## [13] xfun_0.14 htmltools_0.4.0 yaml_2.2.1 ade4_1.7-15
## [17] digest_0.6.25 crayon_1.3.4 evaluate_0.14 rmarkdown_2.2
## [21] stringi_1.4.6 compiler_4.0.0